Accurate identification of polyadenylation sites from 30 end deep sequencing using a naı̈ve Bayes classifier

نویسندگان

  • Sarah Sheppard
  • Nathan D. Lawson
  • Lihua Julie Zhu
چکیده

Motivation: 30 end processing is important for transcription termination, mRNA stability and regulation of gene expression. To identify 30 ends, most techniques use an oligo-dT primer to construct deep sequencing libraries. However, this approach can lead to identification of artifactual polyadenylation sites due to internal priming in homopolymeric stretches of adenines. Although heuristic filters have been applied in these cases, they typically result in a high proportion of both false-positive and -negative classifications. Therefore, there is a need to develop improved algorithms to better identify mis-priming events in oligo-dT primed sequences. Results: By analyzing sequence features flanking 30 ends derived from oligo-dT-based sequencing, we developed a naı̈ve Bayes classifier to classify them as true or false/internally primed. The resulting algorithm is highly accurate, outperforms previous heuristic filters and facilitates identification of novel polyadenylation sites. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online. Received on May 5, 2013; revised on June 25, 2013; accepted on

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accurate identification of polyadenylation sites from 3′ end deep sequencing using a naïve Bayes classifier

MOTIVATION 3' end processing is important for transcription termination, mRNA stability and regulation of gene expression. To identify 3' ends, most techniques use an oligo-dT primer to construct deep sequencing libraries. However, this approach can lead to identification of artifactual polyadenylation sites due to internal priming in homopolymeric stretches of adenines. Although heuristic filt...

متن کامل

Application of a Naïve Bayes Classifier to Assign Polyadenylation Sites from 3' End Deep Sequencing Data: A Dissertation

Cleavage and polyadenylation of a precursor mRNA is important for transcription termination, mRNA stability, and regulation of gene expression. This process is directed by a multitude of protein factors and cis elements in the pre-mRNA sequence surrounding the cleavage and polyadenylation site. Importantly, the location of the cleavage and polyadenylation site helps define the 3’ untranslated r...

متن کامل

Combining multi-species genomic data for microRNA identification using a Naı̈ve Bayes classifier

Motivation: Most computational methodologies for microRNA gene prediction utilize techniques based on sequence conservation and/or structural similarity. In this study we describe a new technique, which is applicable across several species, for predicting miRNA genes. This technique is based on machine learning, using the Naı̈ve Bayes classifier. It automatically generates a model from the train...

متن کامل

Learning Naı̈ve Bayes Classifiers From Attribute Value Taxonomies and Partially Specified Data

Partially specified data are commonplace in many practical applications of machine learning where different instances are described at different levels of precision relative to an attribute value taxonomy (AVT). This paper describes AVTNBL an extension of the Naı̈ve Bayes Learning algorithm that effectively exploits user-supplied attribute value taxonomies to construct compact and accurate Naı̈ve...

متن کامل

Towards Biometric Person Identification using fNIRS

We investigate the potential of using fNIRS signals for biometric person identification. Independent sessions for training and testing have been recorded using 8 channels of frontal fNIRS. We extract logarithmic power spectral densities as features to train and test a Naı̈ve Bayes Classifier. We evaluate different frequency bands and report classification results for different trial lengths.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013